Goto

Collaborating Authors

 outer objective




Coresets via Bilevel Optimization for Continual Learning and Streaming

Borsos, Zalán, Mutný, Mojmír, Krause, Andreas

arXiv.org Machine Learning

Coresets are small data summaries that are sufficient for model training. They can be maintained online, enabling efficient handling of large data streams under resource constraints. However, existing constructions are limited to simple models such as k-means and logistic regression. In this work, we propose a novel coreset construction via cardinality-constrained bilevel optimization. We show how our framework can efficiently generate coresets for deep neural networks, and demonstrate its empirical benefits in continual learning and in streaming settings.


Learning Discrete Structures for Graph Neural Networks

Franceschi, Luca, Niepert, Mathias, Pontil, Massimiliano, He, Xiao

arXiv.org Machine Learning

Relational learning is concerned with methods that cannot only leverage the attributes of data points but also their relationships. Diagnosing a patient, for example, not only depends on the patient's vitals and demographic information but also on the same information about their relatives, the information about the hospitals they have visited, and so on. Relational learning, therefore, does not make the assumption of independence between data points but models their dependency explicitly. Graphs are a natural way to represent relational information and there is a large number of machine learning algorithms leveraging graph structure. Graph neural networks (GNNs) (Scarselli et al., 2009) are one such class of algorithms that are able to incorporate sparse and discrete dependency structures between data points. While a graph structure is available in some domains, in others it has to be inferred or constructed. A possible approach is to first create a k-nearest neighbor (kNN) graph based on some measure of similarity between data points. This is a common strategy used by several learning methods such as LLE (Roweis & Saul, 2000) and Isomap (Tenenbaum et al., 2000).


Bilevel Programming for Hyperparameter Optimization and Meta-Learning

Franceschi, Luca, Frasconi, Paolo, Salzo, Saverio, Grazzi, Riccardo, Pontil, Massimilano

arXiv.org Machine Learning

We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn.


Far-HO: A Bilevel Programming Package for Hyperparameter Optimization and Meta-Learning

Franceschi, Luca, Grazzi, Riccardo, Pontil, Massimiliano, Salzo, Saverio, Frasconi, Paolo

arXiv.org Machine Learning

In (Franceschi et al., 2018) we proposed a unified mathematical framework, grounded on bilevel programming, that encompasses gradient-based hyperparameter optimization and meta-learning. We formulated an approximate version of the problem where the inner objective is solved iteratively, and gave sufficient conditions ensuring convergence to the exact problem. In this work we show how to optimize learning rates, automatically weight the loss of single examples and learn hyper-representations with Far-HO, a software package based on the popular deep learning framework TensorFlow that allows to seamlessly tackle both HO and ML problems.


A Bridge Between Hyperparameter Optimization and Larning-to-learn

Franceschi, Luca, Donini, Michele, Frasconi, Paolo, Pontil, Massimiliano

arXiv.org Machine Learning

We consider a class of a nested optimization problems involving inner and outer objectives. We observe that by taking into explicit account the optimization dynamics for the inner objective it is possible to derive a general framework that unifies gradient-based hyperparameter optimization and meta-learning (or learning-to-learn). Depending on the specific setting, the variables of the outer objective take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We show that some recently proposed methods in the latter setting can be instantiated in our framework and tackled with the same gradient-based algorithms. Finally, we discuss possible design patterns for learning-to-learn and present encouraging preliminary experiments for few-shot learning.